Multi-document Summarization Using Support Vector Regression
نویسندگان
چکیده
Most multi-document summarization systems follow the extractive framework based on various features. While more and more sophisticated features are designed, the reasonable combination of features becomes a challenge. Usually the features are combined by a linear function whose weights are tuned manually. In this task, Support Vector Regression (SVR) model is used for automatically combining the features and scoring the sentences. Two important problems are inevitably involved. The first one is how to acquire the training data. Several automatic generation methods are introduced based on the standard reference summaries generated by human. Another indispensable problem in SVR application is feature selection, where various features will be picked out and combined into different feature sets to be tested. With the aid of DUC 2005 and 2006 data sets, comprehensive experiments are conducted with consideration of various SVR kernels and feature sets. Then the trained SVR model is used in the main task of DUC 2007 to get the extractive summaries.
منابع مشابه
Building a Trainable Multi-document Summarizer
This paper describes an approach to building a trainable multi-document summarization system, using a simple training process based on support vector machines. The summarization system is trained and tested using the DUC 2005 data set. The evaluation results based on ROUGE scores are presented and methods for improving the performance of the summarization system are identified.
متن کاملExtractive Multi-Document Summarization with Integer Linear Programming and Support Vector Regression
We present a new method to generate extractive multi-document summaries. The method uses Integer Linear Programming to jointly maximize the importance of the sentences it includes in the summary and their diversity, without exceeding a maximum allowed summary length. To obtain an importance score for each sentence, it uses a Support Vector Regression model trained on human-authored summaries, w...
متن کاملTGSum: Build Tweet Guided Multi-Document Summarization Dataset
The development of summarization research has been significantly hampered by the costly acquisition of reference summaries. This paper proposes an effective way to automatically collect large scales of news-related multi-document summaries with reference to social media’s reactions. We utilize two types of social labels in tweets, i.e., hashtags and hyper-links. Hashtags are used to cluster doc...
متن کاملA SVM-Based Ensemble Approach to Multi-Document Summarization
In this paper, we present a Support Vector Machine (SVM) based ensemble approach to combat the extractive multi-document summarization problem. Although SVM can have a good generalization ability, it may experience a performance degradation through wrong classifications. We use a committee of several SVMs, i.e. Cross-Validation Committees (CVC), to form an ensemble of classifiers where the stra...
متن کاملAutomatic Annotation Techniques for Supervised and Semi-supervised Query-focused Summarization
In this paper, we study one semi-supervised and several supervised methods for extractive query-focused multi-document summarization. Traditional approaches to multidocument summarization are either unsupervised or supervised. The unsupervised approaches use heuristic rules to select the most important sentences, which are hard to generalize. On the other hand, huge amount of annotated data is ...
متن کامل